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I.  Introduction 

In  these  lectures  we  discuss  parameter  identification 
problems  for  control  systems  that  frequently  arise  in  certain 
physiological  models.  By  a  parameter  identification  problem  we 
shall  mean  that  we  are  given  a  model  with  unknown  parameters 
along  with  observations  of  the  system  that  the  model  is  supposed 
to  represent  and  we  must  use  these  observations  to  determine 
values  for  the  unknown  parameters.  Specifically,  we  shall  focus 
our  attention  on  problems  for  nonlinear  models  involving  either 
(1)  delay-differential  and,  more  generally,  functional  differential 
equations  (FDE) ,  or  (2)  distributed  parameter  systems  realized  by 
partial  differential  equations  (PDE) .  In  the  first  case,  we 
shall  discuss  techniques  that  are  applicable  to  problems  that 
contain  unknown  delays  as  well  as  unknown  transport  coefficients 
among  the  parameters.  In  the  case  of  (2)  we  present  methods  for 
problems  which  contain  unknown  diffusion,  solubility,  and  other 
transport  coefficients.  In  both  cases  the  methods  can  be  used  in 
problems  with  unknown  parameters  in  the  initial/boundary  data. 

Briefly,  our  presentation  will  be  as  follows.  First  we 
give  motivating  examples  consisting  of  models  arising  in  respiratory- 
circulatory  physiology  and  renal  physiology.  Before  discussing  some 
inherent  difficulties  in  parameter  estimation  for  the  resulting 
classes  of  problems,  we  review  standard  methods  available  for 
parameter  identification  of  ordinary  differential  equation  models. 
Included  for  mention  in  a  least-squares  formulation  are  descent 
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methods  such  as  gradient,  conjugate-gradient,  and  certain  quasi- 
Newton  algorithms.  Finally,  we  explain  how  one  can  use  Ritz- 
Galerkin  ideas  to  employ  the  standard  techniques  in  a  way  that 
circumvents  the  difficulties  mentioned  and  produces  convergent 
identification  algorithms  for  problems  involving  nonlinear  FDE 
and  PDE. 

We  provide  an  adequate  but  not  exhaustive  bibliography. 
Many  of  the  publications  we  have  selected  to  cite  here 
include  rather  extensive  reference  to  the  current  research 
literature. 
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2 .  Models  from  physiology  involving  delay  and  distributed 
parameter  systems 

We  first  consider  respiratory  models  for  humans  which  are 
formulated  with  the  overall  schematic  depicted  in  Figure  1  in  mind. 
These  models  are  often  ones  which  combine  (1)  the  respiratory 
system  (based  on  gas  dynamics)  with  compartments  such  as  alveolar 
space,  dead  space,  airways,  and  atmosphere  which  are  connected 
to  a  "gas  pump"  and  (2)  the  circulatory  system  (often  based  on 
transport  theory  and  fluid  dynamics)  with  compartments  such  as 
pulmonary  vein,  left  heart,  aorta,  arteries,  tissue,  veins,  right 
heart,  and  pulmonary  artery.  The  respiratory  and  circulatory 
components  are  connected  via  a  diffusion  compartment  representing 
exchange  of  the  principal  gases  N2,C>2,  H20  and  C02  which  are 
the  subject  of  investigations  in  these  models  [13],  [17],  [19], 
[24],  [25] ,  [26] ,  [27] ,  [33] , [34]. This  gas  exchange  takes  place,  of 
course,  during  the  flow  of  the  blood  from  the  pulmonary  artery  to 
the  pulmonary  vein  as  it  perfuses  the  alveoli. 

Many  of  the  models  entail  systems  of  ordinary  differential 
equations  and  the  more  realistic  ones  usually  take  into  account 
transport  delays  and  thus  involve  delay-differential  equations 
or  functional  differential  equations .  Briefly  one  uses  understand¬ 
ing  of  the  basic  biomechanical  and  biochemical  mechanisms  of  the 
systems  to  write  mass  balance  equations.  These  equations  relate 
the  alveolar  partial  pressures  PA^j»  brain  concentrations  CB^j, 
and  tissue  concentrations  ^T(j)  ^or  var^ous  species 
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Figure  1 
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j  -  Oj»  C02'  H20'  N2  to  ot^er  variables  such  as  arterial  con¬ 
centrations  ca(j)»  venous  concentrations  Cv^j,  inspired  and 
expired  volumes  V^,  VE,  and  control  parameters  consisting  of 
blood  flow  rate  QB  and  respiratory  minute  volume  RMV.  These 

latter  variables  are  under  CNS  control  via  sensors  for  C  , . . 

a  (] ) 

in  the  carotid  body  and  ^b(Cq  j in  the  brain.  Equilibrium 
equations  relating  p^(j)  and  Ca(j)  must  also  be  written. 

To  illustrate  how  the  transport  delays  are  incorporated,  we 
refer  to  the  simplified  schematic  of  Figure  2.  From  this,  we  see 
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that  transport  delays  t  come  into  play  when  relating  the 
arterial  concentrations  CaT(j)^  at  brain  anc^ 

tissue  entrance  to  the  arterial  concentrations  Ca^j(t)  at  the 
lung  exit.  Delays  are  also  appropriate  in  relating  the  mixed 


venous  concentrations  cv(j)  entering  the  lungs  with  the  venous 
concentrations  ('VB(j)'  *"vT(j)  at  brain  anc^  tissue  exits. 

Using  delay  parameters  as  depicted  in  Figure  2,  we  have 


C 

C 
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aT  ( j ) 
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v  ( j ) 
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Ca ( j ) (t_xaB) 
Ca ( j ) (t-TaT) 


g(CvB(j)  (t-TvB)  '  CvT  ( j  )  (t-Vr*  J 


The  delays  themselves  are  in  general  functions  of  some  of  the 

variables  (e.g.,  Qq  and  thus  ca(o  j  and  Ca(CO  the 

2  2 


models.  However,  often  it  is  useful  in  employing  the  models  in 
studies  to  select  "apparent"  values  for  the  delays  by  fitting 
them  to  data.  Delays  can  also  arise  in  modeling  the  feedback 
control  loops  (see  [34])  since  the  sensors  for  CO^  are  in  the 
brain  and  carotid  body  while  control  is  at  the  level  of  expired 
volume  rate  Vg.  That  is,  instead  of  RMV  one  might  equivalently 
focus  on  the  control  V£  =  h  (PA(C0^(t-T^)  ,  PA(Co^t-T2^  where 


t  ^ ,  t 2  are  transport  times  from  the  alveoli  to  the  brain  and 
carotid  body  respectively. 

However  the  detailed  model  equations  are  formulated,  it 
should  be  clear  that  one  is  lead  naturally  to  feedback  control 


systems  with  delays 


(2.1)  x(t)  =  f (a,x(t) ,x(t-T^) , . . . ,x(t-t  ) ) 

in  which  one  often  wishes  to  "identify"  the  parameters 
q  =  (af xlf . . . ,tv)  . 

Models  for  respiration  in  certain  insects  have  also  been 

investigated  [11],  [12]  and  these  models  involve  a  completely 

different  type  of  system  for  which  identification  procedures  are 

needed.  Consider  a  tube  (trachea  -  through  which  gas  flows)  of 

2 

length  L  and  radius  r  (cross-section  S  =  itt  )  as  depicted 
in  Figure  3.  Denote  by  e  the  thickness  of  the  wall  of  the  tube 
and  by  S.  the  cross-section  of  blood  surrounding  the  tube. 
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Using  mass  balance  and  considering  (i)  diffusion  along  the  axis 
of  the  tube  (radial  diffusion  assumed  negligible),  (ii)  convection 
related  to  movement  of  the  gas,  and  (iii)  diffusion  across  the 
longitudinal  wall,  one  can  write  equations  which  the  partial 
pressures  of  oxygen,  carbon  dioxide,  and  nitrogen  must  satisfy. 
Letting  y^,y2ry2  denote  the  partial  pressures  in  the  tube  of 
,  CO.,,  N2  respectively  and  w^,w2,w2  denote  the  corresponding 
partial  pressures  in  blood  of  these  substances,  one  finds  (the 
first,  second,  and  third  terms  on  the  right  in  each  equation 
represent  axial  diffusion,  convection  and  diffusion  across  the 
wall  respectively) : 


(2.2) 


ayj 

at 

=  qj 

3  yi  _  l 

.  2  PS 

a  x 

a 

ax 

(Vy.) 

-  Dj  ^  (yj-wj). 

3W. 

at 

=  qj 

3S  6 

.  2  S, 

ax  l 

3Wj 

ax 

j  =  1,2,3. 


Here  the  parameters  q^,q^,D^  represent  coefficients  of  diffusion, 
V  is  the  velocity  of  the  convective  flow  of  the  gas,  P  is 
atmospheric  pressure,  and  the  Oj  denote  coefficients  of 
solubility.  Of  course,  Q  is  the  blood  flow  rate.  Among  the 
important  parameters  to  be  estimated  in  use  of  such  a  model  are 
the  diffusion  and  solubility  coefficients  qj,qj,D j,°j. 

Systems  of  partial  differential  equations  arising  from 
convective-diffusive  phenomena  also  are  found  in  numerous  models 


9. 


for  renal  function.  These  models  are  usually  based  on  mechanisms 
present  in  the  nephron,  the  basic  functional  unit  of  the  kidney 
(approximately  one  million  per  kidney)  for  which  a  schematic  is 
given  in  Figure  4.  Among  the  important  mechanisms  for  transport 
between  the  blood  and  fluid  in  the  nephron  are  filtration  (transport 
from  the  plasma  to  glomerular  fluid  in  Bowman's  capsule),  reabsorption 
(from  the  tubule  back  to  the  plasma;  some  of  this  involves  active 
transport,  some  passive) ,  and  secretion  (transport  from  plasma  to 
urine  which  can  be  active  or  passive) .  For  example,  as  fluid  moves 
down  the  tubule,  NaCl  is  removed  actively,  H20  is  removed 
passively  from  the  tubule,  leaving  the  concentration  of  urea  higher 
in  the  tubule  than  in  the  perfusing  plasma.  Urea  is  then  reabsorbed 
passively.  The  flow  velocity  in  the  tubule  thus  affects  directly 
the  amount  of  urine  produced.  For  a  high  velocity  flow  only  about 
40%  of  the  urea  is  reabsorbed  whereas  for  low  velocity  flow  (and 
hence  a  high  rate  of  removal  of  NaCl  and  J^O)  up  to  80%  of  the 
urea  is  reabsorbed. 

Since  approximately  7/8  of  the  total  number  of  nephrons  in 
a  kidney  are  located  in  the  cortex,  much  of  the  modeling  in  the 
literature  (e.g.,  see  [10] ,  [30]  ,  [31]  ,  [32] )  is  focused  on  function 
in  the  renal  cortex.  Mathematical  models  are,  not  surprisingly, 
based  on  principles  of  conservation  of  mass  and  chemical  species 
as  well  as  the  theory  of  non-equilibrium  thermodynamics.  A  typical 
one-dimensional  compartmental  model  might  be  derived  employing  a 
schematic  as  depicted  in  Figure  5.  The  basic  equation  for  such 


..vT 


(descending) 


models  is  a  one-dimensional  convection-diffusion  equation  of  the 
form 


(2.3) 


a 

at 


(p  j  A) 


A)  +  (VPj) 


J.S 

3 


where  Pj  is  the  mass  density  of  species  j,  A  is  the  cross 

sectional  area  of  the  flow  region,  V  is  the  volumetric  flow 

aP 

rate  in  the  axial  direction  x,  q.  x — **-  is  the  axial  mass  flux, 

]  <7  X 

S  is  the  wall  surface  area  per  unit  length  and  is  the  mass 

flux  of  species  j  across  the  wall.  We  note  that  these  equations 
result  in  a  model  of  the  same  general  form  as  that  given  in  (2.2). 


From  To  Collecting  Cortical 

Glomerulus  Duct  Peritubular 


Medul  1  a 

Figure  5 
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Summarizing,  we  see  that  models  for  respiratory  and  renal 
physiological  control  systems  typically  yield  two  basic  types  of 
systems  for  which  parameter  identification  methods  are  required. 

One  may  have  a  delay  system  of  the  form  (2.1)  in  which  one  seeks 
to  estimate  parameters  q  =  (ct ,  ,  . .  .  ,  t y )  including  transport 

coefficients  and  the  delays.  Or  one  may  be  faced  with  identifying 
parameters  including  diffusion  and  solubility  coefficients  in 
parabolic  distributed  parameter  (i.e.,  partial  differential  equation) 
systems  such  as  (2.2)  and  (2.3).  In  both  cases  one  must  impose 
appropriate  boundary  and/or  initial  conditions  and  these  also  may 
contain  unknown  parameters  for  which  estimates  are  needed. 

These  problems  share  certain  inherent  difficulties  which  we 
shall  mention  in  section  4.  Before  doing  so,  however,  it  is 
necessary  to  review  standard  techniques  that  are  readily  available 
for  parameter  identification  in  ordinary  differential  equation 
models. 
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.  Parameter  identification  for  ordinary  differential  equations 


Consider  a  system  which  is  modeled  by  an  ordinary  differential 


equation  in  Rr 


x ( t )  =  f (q , t , x (t ) ) ,  0  <  t  <  T, 


x  (0 )  =  xQ, 


where  q  is  a  vector  parameter  in  Ru  to  be  determined  by  ob- 

f 

servations  of  the  system.  A  set  of  observations  y^  £  R  , 
i  =  1 , . . . ,m,  for  y(t.)=Cx(t.)#  0  <  t,  <...  <  tm  <  T  is 
given  with  C  an  l  x  n  matrix.  One  wishes  to  use  these  to 
select  a  "best-fit"  value  q  in  Q;  here  Q  is  a  given  compact 
constraint  set  (admissible  parameter  values) .  A  rather  standard 
formulation  is  the  least-squares  fit-to-data  in  which  one  seeks 
to  choose  q  6  Q  so  as  to  minimize 


(3.2) 


J(q)  =  J  l  |y  (t. ;q)  ~  Y i\ 
i  =  l 


where  y(t^;q)  =  Cx(t^jq)  with  x  the  solution  of  (3.1)  corres¬ 
ponding  to  q.  In  practice,  one  turns  to  iterative  methods  to 
solve  such  problems. 

Among  the  minimization  techniques  one  might  consider  for 
these  problems  are  those  classified  as  direct  search  methods  (for 
a  detailed  discussion  of  some  typical  methods  in  this  class,  see 
[10]).  These  methods  consist  of  procedures  which  generate  a 
sequence  of  trial  solutions  for  minimizing  J.  Examination  of  a 


r  (■ 
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given  trial  solution  is  by  simple  comparison,  the  result  being 
used  to  indicate  further  steps  in  the  search  procedure.  These 
methods,  which  make  use  of  only  functional  evaluations,  are 
attractive  in  some  cases  if  one  suspects  that  the  function  J  to 
be  minimized  is  not  smooth,  but  they  are  slow  and  usually  are  quite 
inefficient  when  highly  accurate  solutions  are  desired.  Further¬ 
more,  they  have  been  developed  heuristically  and  no  proofs  of 
convergence  have  been  given.  Indeed,  while  they  might  be  quite 
useful  in  indicating  the  general  location  of  a  minimizing  point, 
often  their  performance  with  regard  to  actual  convergence  is  poor. 
One  might  expect  that  iterative  methods  which  involve  use  of  more 
of  the  information  available  about  J  (e.g.,  derivatives  of 
various  orders)  would  give  superior  performance  in  practical 
situat ions . 

Among  a  number  of  nrthods  of  this  latter  type  are  seme  in  a  general 

class  of  methods,  descent  methods,  which  generate  a  sequence  { q  >  of 

k+1  k 

approximations  so  that  J (q  )  <  J (q  ) .  Recall  that  in  many  cases 
minimizing  J  is  equivalent  to  seeking  solutions  of  J'(q)  *  0. 

If  one  applies  Newton’s  method  to  this  latter  problem,  one  obtains 
the  iterative  procedure 

(3.3)  qk+1  =  qk  -  ( J" (qk) ] _1J ' ( qk ) • 


We  further  recall  that  this  is  not  a  descent  method,  but  under 
reasonable  assumptions  one  can  prove  (see  (29] )  that  there  do  exist 
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>  0  such  that 


(3.4) 


k+1  k  rT„ ,  k. ,-lT1 ,  k. 

q  =  q  -  aktJ  (q  )  1  J  (q  ) 


is  a  descent  method.  This  iterative  formula  defines  the  Damped 
Newton 1 s  method,  which  is  a  special  case  of  the  general  iterative 
procedures  defined  by 


(3.5) 


k+1  k  ,  k 
q  =  q  +  a^p 


where  the  "p  "  are  called  directions  for  the  method.  The 
analysis  and  development  of  many  methods  are  concerned  with  how 

V 

one  is  to  choose  the  "directions"  p  and  the  "steplengths"  ak> 

To  motivate  a  choice  of  "good"  directions  p  ,  consider  a 

function  J  that  is  "quadratic"  or  "elliptic"  in  its  behavior 

_  2 

near  a  minimizing  point  q  in  R  .  (A  similar  argument  can  be 

carried  out  in  R1' . )  Then  the  level  surfaces  of  J  (actually 

their  projection  onto  the  R^  plane,  i.e.,  { q | J ( q)  =  M)  for 

constants  M)  are  given  by  closed  curves  as  depicted  in  Figure  6. 

At  any  point  q  on  a  level  curve,  the  gradient  vector 

J ' (q)  =  VJ (q)  =  (- — ,- — )  is  an  outward  normal  vector.  It  is 

3gl  dq2 

clear  that  for  "good"  directions  to  move  toward  q,  one  wants  to 
choose  directions  that  are  "downhill",  i.e.,  directions  given  by 
vectors  p  satisfying  p*J'(q)  <  0.  Under  reasonable  assumptions 


I 


Figure  6 


on  J  (see  [21],  [29]),  one  can  show  that  if  the  p  in  (3.5) 

are  chosen  in  a  "downhill"  manner,  then  it  is  possible  to  choose 

the  steplengths  (a^)  so  that  the  resulting  iterative  procedure 

k 

is  a  descent  method  and  the  iterates  {q  )  converge  to  a  solution 
q  of  J' (q)  =  0. 

From  the  geometric  considerations  above,  one  can  surmise 
that  the  "best"  direction  would  be  in  the  direction  of  -J'  since 


clearly  no  other  direction  can  give  a  larger  local  decrease  in  J. 

Ideas  such  as  this  are  behind  the  steepest  descent  (or  gradient) 
k  k  k 

method  where  p  is  chosen  as  p  =  -J'(q  ),  which  in  the  case 


of  (3.2)  results  in  the  iterative  formula 


(3.6) 


k+1  k  .  k. 

q  =  q  -  a  J’ (q  ) 


where  J'(qk)  =  l  f^-(t .  ;qk)  T{y  (t .  ;qk)  -  y.  }.  (We  assume  here 

that  one  is  using  the  usual  Euclidean  norm  in  .  For  "steepest 
descent"  directions  with  respect  to  other  norms  in  Rw ,  see  [29, 
p.245].)  While  (3.6)  represents  an  intuitively  appealing  choice 
based  on  a  "best"  local  strategy,  the  convergence  properties  of 
the  resulting  method  are  unfortunately  much  poorer  than  one  might 
expect.  One  heuristic  explanation  sometimes  offered  for  this  poor 
performance  is  phrased  as  an  "instability  under  small  perturba¬ 
tions"  and  is  depicted  in  Figure  7. 


18. 


In  this  figure  q  represents  the  point  given  by  precise  computa¬ 
tion  while  q  is  the  actual  computed  value  (different  from  q 
due  to  machine  errors  and  the  failure  to  compute  exactly  the 
previous  descent  step) .  From  the  figure  it  is  clear  that  in 
extreme  situations  one  could  actually  obtain  a  direction  almost 
orthogonal  to  the  one  desired.  In  reality  the  explanation  for  the 
poor  performance  of  the  steepest  descent  method  in  the  general 
case  is  somewhat  more  involved  and  is  probably  related  to  the  fact 
that  the  steepest  descent  directions  (when  the  method  is  applied 
to  a  quadratic  function  J (q)  =  qAq,  A  >  0)  are  in  the  limit 
asymptotic  to  just  two  directions  [21] .  Even  though  for  "nice" 

J  (e.g.f  J (q)  =  qAq  with  A  >  0)  one  can  establish  that  con¬ 
vergence  is  geometric,  i.e.. 


J(qk+1)  <iVV  J(qk, 
'  (A/A^2 


where  are  eigenvalues  of  A,  it  is  fair  to  say  that  the 

practical  usefulness  of  the  steepest  descent  method  in  locating 
precisely  minimizing  points  for  general  functions  J  is  probably 
overrated.  (It  is  common  to  observe  extremely  slow  convergence 
for  this  method  in  the  vicinity  of  the  minimizer.) 

In  a  can?ful  analysis  of  the  difficulties  with  convergence 
of  the  steepest  descent  method,  one  actually  discovers  that  the 
method  essentially  tries  to  approach  a  minimum  in  a  two-dimensional 
subspace  via  use  of  a  very  limited  choice  of  directions  in  the 


iterative  formula  (3.6).  One  way  to  alleviate  this  problem  is 

to  ensure  that  one  uses  an  adequate  supply  of  directions  by  choos- 
th  k 

ing  the  k  direction  p  in  (3.5)  so  that  it  is  mutually 
orthogonal  to  p^  p^  ^ , . . .  ,  p^  This  idea  leads  to  the 

conjugate  gradient  or  conjugate  directions  methods,  the  most  use¬ 
ful  general  minimization  methods  currently  available.  The  basic 
idea  underlying  these  methods  grew  out  of  studies  of  minimization 
techniques  for  quadratic  functionals  J (q)  =  j  gAq  +  B*q  +  E, 

A  >  0,  on  .  These  studies  [29]  reveal  that  if  one  chooses 
mutually  A-orthogonal  directions  p‘'",...,py  (i.e.,  p1Ap')  =  6  —  ) 

and  uses  them  in  the  iterative  formula  (3.5)  with  a  proper  choice 
of  the  (obtained  by  minimization  procedures  -  see  the 

V  , 

discussion  below) ,  the  resulting  sequence  {q  )  converges  m  at 
most  p  steps  to  the  unique  minimizing  point  for  J.  (For  a 
derivation  of  the  conjugate  direction  methods  based  on  Fourier 
expansion  ideas  which  also  yield  the  above  results,  see  Chapter 
10  of  [23] . )  Before  turning  to  a  description  of  one  popular 
generalization  of  these  conjugate  direction  methods,  we  mention 
briefly  that  there  are  several  possible  ways  for  choosing  the 
steplengths  in  (3.5).  Among  so-called  step-length  algorithms 

probably  the  most  often  used  are  those  based  on  the  minimization 
principle 


J(qk+akpk)  =  min{ J (qk+apk) | a  e  R1}. 


(3.7) 
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That  is,  one  chooses  so  that  J  is  minimized  along  the 

line  {q|q=q  +  ctp  ,  a  e  R  } .  This  clearly  always  leads  to  a 
descent  method.  In  complex  problems  one  usually  does  not  insist 
on  precisely  the  minimizing  a,  but  may  use  a  one-dimensional 
search  or  some  type  of  interpolation  scheme  to  find  an  approximate 
to  the  minimizing  a.  For  a  discussion  of  other  methods  (Curry, 

Altman,  ma jorization,  Goldstein)  in  addition  to  (3.7)  see  [29, 
p. 249-257] . 

One  of  the  best  known  conjugate  direction  methods  (the 
Fletcher-Reeves  algorithm)  uses  the  minimization  principle  for 
choice  of  step-lengths  and  chooses  the  direction  "near"  to  those 
of  steepest  descent  or  gradient,  i.e.,  those  in  (3.6),  but 
modified  to  give  conjugate  directions  (in  the  event  J  is  quadratic) . 

It  is  commonly  referred  to  "the"  conjugate-gradient  method.  (This 
is  something  of  a  misnomer  since  there  are  other  algorithms  — e . g. , 
that  of  Daniel  [29]  -  that  are  also  called  conjugate-gradient  methods) . 
Letting  denote  the  gradient  of  J  at  q^,  i.e.,  Gk  =  J'  (qk) , 

we  define  this  iterative  procedure  by 


(3.8) 

(3.9) 


Pktl  =  -Gk+1  +  6kPk 


ev  -  |Gk+1|2/|Gk|2 


k+1  _  k  k 

\  ~  q  +  aup 


(3.10) 
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where  is  chosen  according  to  the  minimization  principle  (3.7). 

We  note  that  if  G  =0,  we  have  that  q  is  a  critical  point  of 
J  and  the  algorithm  is  terminated.  In  the  event  that  J  is 
quadratic  one  can  show  that  the  directions  (3.8)  are  conjugate 
directions.  The  method  is  obviously  a  descent  method.  In  practice 
when  applying  this  algorithm  to  problems  with  general  J  it  is 
common  to  restart  the  method  every  step  (or  every  (p+l)st 

step  as  suggested  in  [16])  by  taking  a  purely  steepest  descent 
step.  That  is,  one  replaces  (3.9)  by 


< 


0  if  k  +  1  =  Mu  for  some  positive  integer  PI 
| Gk+1 | 2/ | Gk | 2  otherwise. 


For  a  discussion  of  the  convergence  properties  of  the 
Fletcher-Reeves  algorithm,  one  should  consult  [29] .  We  note  that 
some  authors  [21]  erroneously  assume  that  convergence  of  these 
algorithms  in  a  finite  number  of  steps  for  quadratic  J  automatically 
implies  quadratic  convergence  in  the  general  situation.  For  general 
J  the  methods  appear  not  to  be  that  well-behaved  [29,  p.512].  Other 
conjugate-direction  type  methods  are  also  discussed  in  Chapter  8  of 
[29]  . 

A  class  of  popular  quasi-Newton  methods  that  are  related  to 

the  above  procedures  is  based  on  applying  the  Gauss-Newton  scheme 

•  k  «  • 

to  a  modified  functional  J.  Given  an  estimate  q  ,  one  linearizes 


y(t^;q)  in  (3.2)  about  q  obtaining  the  modified  functional 


(3.11)  Jk(q)  =  J  |  |y(t.;qk)  +  §J  (t. ; qk) { q-qk} -y . | 2 . 


One  then  chooses  q  =  qK  so  as  to  minimize  this  functional  or 
.  ~  1  k+l 

rather,  to  satisfy  Jk(q  )  =  0.  This  yields  an  equation  for 

k+l  •  w 
q  given  by 


V  3y /*.  h.T  3y  ,  k.  r  k+l  k.  v  ay..  k,Tr  k.  ' 

^(ti;q  >  a^(ti;q  ){q  _q  }  =  aq(ti;q  >  ty^i'-g  )-yi> 

or,  using  the  notation  3^  =  ~^-(t^;qk)  , 


.12)  qk+1  =  qk  -  {J  3ki)  .  I  < ^i* V  <4  '  ^i  > 

1=1  1=1 


-1  m 


A  modification  of  this  algorithm  (usually  called  the  Levenberg- 
Marquardt  algorithm)  is  used  in  a  standard  IMSL  package  that  is 
widely  available  and  involves  the  iterative  formula 


(3.13)  qk+1  -  qk  -  <W  f  3  J  9ki>  ‘  j  t>kllVltiiqk)  -  y,) 

1=1  1=1 


where  =  diag(  £  a^  3^)  and  X.  is  a  parameter  which  can 

i=l 

be  chosen  to  insure  that  (3.13)  is  a  descent  method. 

In  all  of  the  above  procedures  ((3.6),  (3 . 8) - ( 3 . 10) ,  (3.12), 

(3.13)),  one  needs  |^,  which  satisfies  a  linearized  variational 

aq 

equation,  as  well  as  y,  which  satisfies  the  original  nonlinear 


equation  (3.1)  (or  rather  x  does  and  y  =  Cx,  ^  =  C  |^-)  .  These 

must  be  computed  at  each  iterative  step,  even  though  the  iteration 

itself  is  in  a  finite-dimensional  parameter  space  (recall  q  €  Q  c  R 

Thus  at  each  step  of  the  algorithms,  one  must  solve  (3.1)  for 
lc  k  9  v 

t  -*■  y (t ; q  )  =  Cx(t;q  ).  For  -J-  one  can  either  solve  for  the 

dq 

fundamental  matrix  4>^  of  the  linear  variational  equation 


$k(t)  =  H'(c3k/t,x(t;qk) )  <J>k  ( t: ) 


and  then  employ  a  standard  representation  formula  for  —  in  terms 

3q 

of  <J>k  (see  [6]),  or,  one  can  (this  is  most  often  done)  use  a 
difference  approximation  for 

aq 

While  the  above  methods  lead  to  an  iteration  in  a  finite¬ 
dimensional  parameter  space,  there  is  another  widely  publicized 
method,  quasilinearization  (see  [6] ,  and  the  references  therein) , 

which  leads  to  a  simultaneous  iteration  in  a  function  space  and 

k  k  k+1  k+1 

parameter  space.  Given  x  and  q  ,  one  lets  t  -*■  x  (t;q  ) 

be  the  solution  of 

'k+1,..  3f,  k  .  k,...r  k+1,,.  , 

x  (t)  =  —  (q  ,t,x  (t)){x  (t)  -  x  (t)  }  + 

d  X 


(3.14) 


3 f ,  k  .  k,..w  k+1  k.  .  k  .  k,... 

—  (q  , t , x  (t)){q  -q  }  +  f (q  ,t,x  (t)) 
dq 


k+1 ,  n . 

x  (0)  =  xq. 


where  q 


is  chosen  so  as  to  minimize 


(3.15) 


1 

2 
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J  (q 


k+1, 


m 


l  |Cx 

i=l 


k+1 


(ti;q 


k+1 


That  is,  we  require  that  (3.14)  and 


(3.16) 


m 

y  l 


i=l 


|Cxk+1(ti;qk+1) 


0 


k+1  k+1 

be  satisfied  in  choosing  x  ,  q  .  The  algorithm,  which  is 

explained  in  detain  in  (6J ,  entails  the  following  idea.  Observe 
k  k 

that,  given  x  ,q  ,  the  equation  (3.14)  is  a  linear  variational 
k+1  k+1 

equation  for  x  in  terms  of  q  .  One  can  thus  use  a 

k+1  k+1 

representation  formula  for  x  in  terms  q  and  this  can 

k+1 

be  substituted  into  (3.16)  which  is  then  solved  for  q 

Early  advocates  of  the  quasilinearization  algorithm  claimed 
that  it  converged  quadratically  (when  it  converged) .  If  this  were 
true,  it  might  offer  significant  advantage  over  standard  routines 
such  as  the  Gauss-Newton  (3.12)  or  (3.13)  above  which  is  usually 
quadratically  convergent  only  when  the  model  is  exact  (i.e.,  there 
exists  a  choice  q  so  that  J(q)  =0  in  (3.2)). 

Both  theoretical  and  numerical  studies  comparing  the  Gauss- 
Newton  and  quasilinearization  algorithms  have  been  carried  out  and 
the  results  reported  in  [1],  [6],  [18].  For  the  identification 

problems  formulated  here,  it  was  found  that  from  a  theoretical 
viewpoint,  the  convergence  properties  of  both  the  quasilinearization 
and  Gauss-Newton  algorithms  are  essentially  equivalent  (quadratic 
convergence  when  there  is  an  exact  fit  of  the  model  to  the  data; 
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3T” 


at  best  linear  convergence  otherwise) .  In  the  numerical  studies 
little  difference  between  the  algorithms  was  found  in  the  number 
of  iterations  to  convergence.  The  quasilinearization  algorithm 
was  faster  per  iteration  (requiring  fewer  equivalent  function 
evaluations  than  Gauss-Newton)  but  required  much  larger  amounts 
of  storage.  The  GaUss-Newton  algorithm  was  usually  more  accurate 
and  much  less  complex  to  program. 

Results  of  these  investigations  appear  to  favor  a  simple 
Gauss-Newton  algorithm  applied  directly  to  J  over  the  quasi¬ 
linearization  algorithm  for  problems  of  the  type  formulated  in 
this  section. 

A  slightly  different  (but  in  some  respects  equivalent) 
formulation  of  the  parameter  identification  problem  for  (3.1) 
seeks  a  value  q  that  maximizes  a  "likelihood"  function  instead 
of  a  minimizer  for  J  given  in  (3.2).  That  is,  one  seeks  to 
choose  q  £  Q  so  as  to  maximize  the  likelihood  function 

m 

(3.17)  L (q)  =  l  in  9 (y • -y (t . ; q) ) , 

i=l 

a  heuristic  foundation  of  which  we  shall  explain  briefly  here.  (A 
more  detailed  explanation  of  these  procedures  can  be  found  in 
almost  any  standard  text  on  estimation  -  e.g.,  see  (14],  [15],  [22], 
[28].)  A  solution  q  of  this  problem  is  then  called  a  maximum 
likelihood  estimator  (MLE)  for  q* ,  the  "true"  parameter  value 
(which,  of  course,  may  not  exist).  MLE  algorithms  are  based  on 


'.*1  -■ 


the  following  considerations.  The  observations  are 

assumed  to  be  corrupted  by  random  measurement  noises  (N  , 

so  that  we  may  write  y^  =  y(t^;q*)  +  ,  i  =  l,...,m.  Assuming 

that  are  independent  random  variables  with  identical 

probability  density  functions  g,  the  joint  density  function  is 

m 

then  given  by  g  ( n-,  ,  n9 ,  .  .  •  ,  n  )  =  n  g(n-).  Intuitively,  the 

i  z  m  i=1  i 

function  g  should  possess  a  maximum  at  those  values  of 
(nj_ »  .  •  •  »  nm)  that  are  most  likely  to  occur.  A  procedure  for 
estimating  q*  might  reasonably  be  devised  on  the  basis  that  the 
observed  values  of  =  y^  -  y(t^;q*)  correspond  to  those  that 

are  most  likely  to  occur,  that  is,  g (N^ , . . . ,Nm)  =  max  g ( » • • • » nm) • 
Defining  the  function  G(q)  =  g  (y^y  (t1  ;  q)  ,  .  .  .  ,  ym-y  (tm;  q)  )  we 
therefore  might  seek  a  value  q  that  yields  a  maximum  for  G. 

For  technical  reasons,  it  is  more  convenient  to  maximize  the 
natural  log  of  this  function  so  that  one  defines  the  likelihood 
function 

m 

L  (q)  =  zr\  G(q)  =  £  in  g  (y  .  -y  (t .  ;q)  ) 

i=l 

and  equivalently  seeks  a  maximizer  q  (called  an  MLE  for  q*)  of  L. 
In  certain  cases  (see  [15])  maximizing  L  is  completely  equivalent 
to  minimizing  the  least  squares  criterion  J  defined  by  (3.2). 

More  generally,  the  MLE  procedures  reduce  to  an  algorithm  to 
determine  a  solution  q  of  L' (q)  =  0. 


4 .  Inherent  difficulties  in  FDE  and  PDE  identification 

In  all  of  the  techniques  outlined  in  section  3  for  ordinary 
differential  equations,  we  noted  that  at  each  iterative  step  one 
usually  required  solution  of  the  original  system  (3.1)  as  well 
as  perhaps  solution  of  an  associated  variational  equation.  The 
system  (3.1)  and  any  linearized  variational  equation  involves 
systems  in  f inite-diment ional  state  space  Rn .  For  delay  systems 
and  distributed  parameter  systems  such  as  those  discussed  in 
connection  with  the  physiology  models  of  section  2,  one  must  deal 
with  infinite  dimensional  ’'state"  processes.  If  applied  directly 
to  such  systems  the  iterative  methods  of  section  3  would  entail 
huge  storage  requirements  and  would,  in  many  cases  of  practical 
interest,  prove  unwieldly  and  indeed  unfeasible. 

Even  if  one  had  unlimited  storage  capabilities  so  that  the 
above  mentioned  difficulties  might  effectively  be  circumvented, 
there  are  even  more  serious  problems  inherent  in  using  the  methods 
of  section  3  directly  for  certain  identification  problems  involving 
delay  systems.  As  we  saw  in  section  2,  it  is  often  very  important 
to  estimate,  among  other  parameters,  the  delays  in  a  functional 
differential  equation  model.  A  careful  inspection  of  the  techniques 
outlined  in  section  3  reveals  that  one  must  deal  with  the  derivatives 

or,  in  the  case  one  is  identifying  the  delays  in  x(t)  = 

3  X 

f (a , x (t) ,x (t-T .),... ,x  (t-T  )),  one  must  make  use  of  » —  as  well 

A-  \)  o  T  ^ 

as  .  Even  if  f  is  extremely  smooth,  these  derivatives  need 

o  Cl 

not  exist.  Consider,  for  example, 


I 


4 
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(4.1) 


x (t)  =  x  (t-x) ,  t  >  0 


x  (0)  =  <f>  (0)  ,  0  <  0 


where  the  initial  function  <t>  is  defined  by  <M9)  =  -1  <  0  <_  0, 

4>(6)  =  1  for  0  <_  -1.  For  t  <_  1  this  equation  has  solution 
x(t)  =  j  +  ~t  on  [0,t].  For  t  >  1,  say  t  =  1  +  e,  we  find 

the  solution  is  given  by  x(t)  =  ~  +  t  on  [0  ,e],  x(t)  =  ^  +  e  + 
^(t-c)  on  (e,l+r.].  If  one  then  fixes  t^  in  (0,1),  it  is 
quite  easy  to  demonstrate  that  t  -*■  x(t.;r)  is  not  differentiable 
at  t  =  1;  i.e.,  ~  (t.;l)  does  not  exist.  Hence  one  cannot 

a  T  1 

apply  directly  the  methods  outlined  in  section  3  for  identifica¬ 
tion  of  delay  parameters  in  examples  such  as  this  one.  The 
methods  we  describe  in  the  next  section  can  be  used  to  overcome 


such  difficulties  as  well  as  those  inherent  in  the  infinite- 
dimensionality  of  the  state  in  such  systems. 


5 •  Techniques  for  FDE  and  PDE  identification 

We  turn  finally  to  methods  for  parameter  estimation  in 
problems  involving  functional  differential  or  partial  differential 
equations-  We  shall  for  ease  in  exposition  assume  a  least-squares 
formulation  of  the  basic  identification  problem.  Thus  we  seek  to 
minimize  J  over  Q  with 


(5.1) 


J  (q) 


,  m 

-  I  l 

z  i=l 


y(ti;q)-yil 


as  in  section  3  except  now  y  is  the  output  for  either  an  FDE  or 
a  PDE.  For  example,  we  might  have  y(t)  =  Cx(t)  with  x  the 
solution  of  an  underlying  delay  system  such  as  (2.1)  or  y(t)  = 
col (Cu (t ,x1 ) , . . . ,Cu (t ,x  ) )  with  u  the  solution  of  a  system  of 
partial  differential  equations.  Before  turning  to  specific  cases 
which  differ  slightly  in  detail,  we  outline  briefly  the  fundamental 
ideas  involved  in  development  of  the  methods  we  shall  discuss. 

We  first  rewrite  the  system  (FDE  or  PDE)  as  an  abstract 
equation  in  an  appropriately  chosen  Hilbert  space  Z: 


(5.2) 


z  (t)  =  _Q^(q)z  (t)  +  G  (t ) 
z (0)  =  zQ. 


Here  3#  may  be  either  a  linear  or  nonlinear  operator  depending  on 
parameters  q  €  Q.  The  identification  problem  is  reformulated  in 
a  corresponding  manner  so  that  one  seeks  q  €  Q  that  minimizes 


where  y(t)  =  r(z(t^;q))  is  an  appropriately  defined  output. 

The  problem  now  has  the  appearance  of  a  parameter  identifica¬ 
tion  problem  for  an  ordinary  differential  equation  except,  of  course, 
we  are  working  in  an  infinite  dimensional  state  space  Z  instead 
of  Rn  as  in  section  3.  We  attempt  to  reduce  the  problem  for 
(5.2),  (5.3)  to  an  approximate  one  in  a  finite  dimensional  space. 

The  approximation  idea  we  employ  is  a  classical  one  commonly 
referred  to  as  the  Ritz-Galerkin  technique.  We  choose  subspaces 
ZN  of  Z  with  projections  PN:Z  ->  ZN  and  solve  a  least-squares 
problem  for  the  system  as  it  is  "projected"  or  approximated  in 
these  subspaces.  That  is,  we  seek  to  minimize 

(5.4)  JN(q)  =  \  l  |r(zN(ti;q))  -  y. | 2 

i=l 

over  Q  subject  to  the  approximate  system 


(5.5) 


zN(t)  =  V*(q)zN(t)  +  PNG  (t ) 


ZN(0)  =  PNzq. 


Our  hope,  of  course,  is  that  by  clever  choices  of  ZN,  PN  ancJ 
_t/N,  we  might  be  able  to  insure  that  zN(t)  -*■  z(t)  and  that 

solutions  q^  of  minimizing  (5.4)  subject  to  (5.5)  will,  as 


N  -v  oo,  approach  some  q  in  Q  that  is  a  solution  of  the  original 
problem  for  (5.1). 


We  describe  two  particular  choices  for  such  schemes: 

In  the  case  of  PDF's  we  outline  developments  for  spline  subspaces 
N 

Z  while  for  PDE '  s  we  report  on  results  involving  modal  (eigen- 

N 

function)  subspaces  Z  . 


where  xfc  is  the  usual  notation  for  the  function  >'<  -  = 

x(t+0),  -r  =  t  «  o  ^  0,  representing  a  functional  dependence 

V 

on  the  past  in  our  system.  In  general  we  seek  to  determine  from 

data  (observations  on  C  x)  values  for  the  parameters 
q  =  ( a  ,  t  ^  ,  .  .  .  ,  t  ^ )  .  By  choosing  Z  =  Rn  *  I^i-r.O.-R0)  with  an 
appropriate  inner  product  ,  ,  >  and  employing  as  state 
z(t)  =  (x(t),x  ),  where  x  is  the  solution  of  (5.6),  we  can  re¬ 
write  (5.6)  as  an  equivalent  abstract  system  (5.2)  with  &\q) z  = 
(f  (at ,  <l>  (0)  ,  .j,/ ,j,  (-t  ,  .  .  .  ,  Hi  (-t  )  )  ,  it- ' )  for  z  =  U(0),ij,)  in  a 

judiciously  chosen  domain  in  Z  (for  details  see  [3])  and 

zQ  «  (* (0) ,♦)  . 

In  describing  the  spline  approximations  we  shall  for  ease 


in  exposition  restrict  our  considerations  to  first-order  (piece- 
wise  linear)  spline  approximations  for  a  scalar  equation  of  the 
form  (5.6).  A  general  theory  for  arbitrary-order  spline  approxima 
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tions  is  given  in  [7]  for  linear  FDE  and  in  [3],  [20]  for  nonlinear 
FDE .  We  shall  also  assume  (again  only  for  ease  in  exposition)  that 
the  delays  Ti'**‘,Tv  an<^  initial  data  <$>  are  known.  In  this 
case,  the  theory  for  nonlinear  system  parameter  estimation  via 
spline  approximations  is  given  in  [3] ,  while  the  general  nonlinear 
theory  for  estimation  of  parameters  including  delays  and  initial 
data  (which  is  technically  more  difficult  to  describe)  is  given  in 
[5]  . 

Let  e j ,  j  =  0,1,... N,  be  the  classical  first-order  spline 
functions  on  [-r,0]  satisfying  e^(tj)  =  <S ^ j  ,  where  5^j  is 
the  Kronecker  symbol  and  t^1  =  -  .  That  is,  we  partition  [-r,0] 


-N  tN  1 
'j'  j-1' 


0  at  t.+1 


where 


and 


„N 

'j- 


N 


N 


N 

'3 


eN 

3 

to 

be  1  at 

tN 

3  ' 

N 

3+1' 

and  [t j , t j 

Let 

zN 

=  span{BQ, 

•  *  * 

Then 

zN 

is  an  N 

+  1 

N(t) 

e  z 

N 

,  we  can 

,N , 


N 


write  zN(t)  =  £  w^tJB1?  where  wN  are  the  generalized  Fourier 

n=n  3  3  3 

J  u  N  .  r  N  N  ■, 

coefficients  or  "coordinates"  of  z  relative  to  {Bq,...,8n). 

Let  PN  be  the  orthogonal  projection  of  Z  onto  ZN  which  is 

characterized  by  the  condition 


where 


a  =  col  («q  ,  ctj  i  .  .  •  i  a^j)  and  is  the  (N+l)  x  (N+l) 

matrix  with  elements  (  gj  )  .  For  first-order  splines  this 

is  a  tridiagonal  matrix  and  equation  (5.8)  is  easily  solved  for 

aN  (for  higher  order  splines  the  analogue  of  KN  is  a  banded 

N 

matrix  so  that  computing  P  is  also  easily  done  in  those  cases) . 

Next  we  define  -0^N(q)  =  PN_o^(q)PN  which  is  readily  computed 
since  we  know  ja/  and  know  how  to  compute  PN.  The  approximating 

equation  (5.5)  for  (5.6)  thus  reduces  to  an  ordinary  differential 

N  ,  N  N  N  N 

equation  in  w  =  col(Wg,  w^,...,wN),  the  coordinates  of  z  . 

One  obtains 

wN(t)  =  AN(q)wN(t)  +  (KV1  col  (g(t)  ,0,  .  .  .  ,0) 

(5.9) 

wN(0)  =  (KN)"1hN((4,(0)  ,$)) 

N 

where  A  (q)  is  the  (in  general  nonlinear)  operator 

AN(q)wN(t)  =  (KN)_1hN(i^(q)  (  f  W^t)^)). 

j=0  3  3 

The  problem  of  minimizing  (5.4)  subject  to  (5.5)  thus  reduces  to 
an  easily  implemented  problem  for  an  ordinary  differential  equation 
(5.9)  for  which  the  standard  techniques  discussed  in  section  3  can 
readily  be  employed. 

We  have  tested  these  ideas  on  a  number  of  examples  and 
present  now  a  small  sample  of  numerical  results  to  illustrate  our 
findings.  In  each  case  an  IMSL  package  employing  the  Levenberg- 


! 

:  i 

Marquardt  algorithm  (see  (3.13))  was  used  to  solve  the  ordinary 
differential  equation  approximating  identification  problems. 

1 

I 

|  Example  5.1. 

We  return  to  the  example  of  section  4  (see  the  comments 
following  (4.1)) 


x  ( t )  =  x(t--r)  ,  t  >  0, 

X  (  0)  =  <j>  (  6)  , 

with  <(>(0)  =  j  for  -1  <  e  <_  0,  <j>(0)  =  1  for  0  <_  -1.  We 


seek  to  identify  a 

"true"  value  t 

=  1.0 

for  the  delay  parameter 

even  though  ~  (t4 

;  1 )  does  not  exist. 

We  present  the  numerical 

.  — N 

findings  in  a  tabular  form  with  t  the 

"converged"  value  (from 

the  iterative  IMSL 

package)  of  t 

for 

the  index  of  approximation 

N.  Start-up  values 

of  xN'°  =  .9 

were 

used  in  each  case  below. 

Very  similar  results  were  obtained 

with 

start-ups  tN'®  =  .2  and  3.0 

The  true  analytical 

solution  was  used  as 

"data"  on  [0,1]. 

T* 

No.  of 

N 

IMSL  Iterates 

2 

.8031 

3 

4 

.9080 

3 

8 

.9686 

5 

16 

.9950 

6 

32 

1.0010 

7 

True 

Value 

1.0000 
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Example  5.2. 

Consider  the  multiple  delay  example 


x(t)  =  2x(t)  +  ax (t-T )  +  x(t-2),  t  >  0, 


x(0)  =  1, 

-2  £  0  £  0 , 

for  which  "data"  corresponding  to  "true 

"  values 

a  : 

=  3.0,  t  =  1 

of  the  parameters  q  =  (a, 

t )  are  easily 

generated 

on 

[0,3].  In 

the  table  below  start-up  values  aN ' ^  = 

,  e-  N,0 

^  «  b  f  x 

= 

1.3  were 

used  for  each  N.  For  each  N  the  IMSL  package 

produced  a 

"converged"  estimate  in  10 

iterations . 

N 

— N 
a 

-N 

T 

2 

3.888 

.9472 

4 

3.243 

.9851 

8 

3.062 

.9961 

16 

3.0159 

.9991 

32 

3.0040 

.9998 

True 

Values 

3.0 

1.0 

Example  5.3. 

Finally  we  consider 

the  nonlinear 

equation 

x(t)  =  2x(t) 

+  5x(t-i)  + 

ax  (t-2 ) 

4- 

>  0 

1+x (t-2 ) ' 

U 

'  u  r 

x (0 )  =  1 


... _ 
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in  which  we  seek  to  estimate  q  =  (a,x) .  An  independent  numerical 
method  can  be  used  to  generate  "data"  on  [0,4]  corresponding  to 
"true"  values  q  =  (a,x)  =  (3. 0,1.0).  Again  the  approximation 
methods  described  above  worked  quite  well  when  applied  to  this 
example.  For  N  =  40,  the  IMSL  package  converged  in  21  iterations 
to  values  q^®  =  (cT40^40)  =  (3 . 0058 ,  .  995)  from  start-up  values 
qN,°  =  (1.0,  .5)  . 


We  conclude  our  brief  discussion  of  the  spline  methods  for 
FDE  parameter  identification  with  several  remarks.  First,  higher 
order  spline  methods  are  easily  implemented  in  the  manner  outlined 
above.  Computational  experiments  by  F.  Kappel  and  colleagues  in  Graz 
reveal  that  (at  least  in  the  case  of  cubic  splines)  the  resulting 
banded  matrices  pose  no  conceptual  or  practical  difficulties  in 
implementation.  Vector  systems  are  also  easily  treated  using  these 
approximations  (see,  for  example,  the  column  reactor  identification 
example  where  n  =  8  in  [2]).  Proofs  of  convergence  for  the 
fundamental  approximations  in  the  case  of  linear  systems  are  given 
via  use  of  abstract  approximation  theorems  from  semigroup  theory 
(a  Trotter-Kato  approximation  theorem)  in  [7]  where  error  estimates 
for  the  "state"  convergence  (z  ->  z)  are  also  given.  The 
fundamental  theory  for  nonlinear  systems  is  given  in  [31,  [20]. 

As  one  might  expect,  one  obtains  that  the  piecewise-linear  elements 
yield  0(i)  estimates  of  convergence,  cubic  elements  converge  like 
O  { )  ,  etc.  We  have  made  numerous  computational  tests  (see  [3], 

[4],  [7],  [20])  with  these  methods  involving  state  approximation 
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only,  parameter  identification  problems,  and  optimal  control 
problems.  Our  experience  has  been  similar  to  that  realized  in  the 
use  of  finite-element  methods  for  certain  boundary  value  problems 
in  that  we  observe  convergence  rates  that  are  better  than  predicted 
by  the  theory.  For  example,  the  piecewise  linear  elements  generate 
schemes  that  appear  basically  second  order  in  behavior  (i.e., 
errors  like  0(^-y))  and  this  is  observed  not  only  in  convergence 
of  the  states  zw  ->  z  but  also  in  parameter  estimates  and  in 
optimal  controls  and  performance  criteria. 

PDE  and  modal  approximations 

To  facilitate  our  discussion  of  modal  approximation  schemes 
we  consider  simple  scalar  nonlinear  parabolic  equations 

(5.10)  ^  -  k-  ox^fx5  +  q2U  +  f  (q4  -t,x,u)  ,  0  <  x  <  1,  t  >  0, 

with  initial  condition 

(5.11)  u  (0  ,x)  =  q3<j>  (x)  , 
and  boundary  conditions 

(5.12)  u  (t , 0 )  =  u ( t , 1 )  =  0, 

where  p  and  k  satisfy  the  usual  Sturm-Liouville  conditions 


(PfP'/k  continuous  with  p  >  0,  k  >  0).  A  general  approximation 
framework  (again  based  on  Trotter-Kato  type  approximation  theorems 
from  semigroup  theory)  that  includes  as  special  cases  both  nonlinear 
parabolic  and  hyperbolic  vector  systems  (with  quite  general  boundary 
conditions  allowed) is  detailed  in  some  of  our  joint  efforts  with 
K.  Kunisch  [8]  ,  [9‘J  .  We  shall  here  only  briefly  indicate  how  one 
chooses  the  spaces  and  operators  in  the  case  of  (5 . 10) - (5 . 12)  so 
that  we  are  in  a  special  case  of  the  general  Ritz-Galerkin  ideas 
outlined  above. 

To  rewrite  (5 . 10 ) -  ( 5 . 12 )  as  an  abstract  Cauchy  problem  we 

rl 

choose  Z  =  L_  (0,1)  with  inner  product  =  $  (x)  \p  (x) k  (x) dx 

J0 

and  define  (on  an  appropriately  chosen  domain  in  Z)  the  operator 
JZ?(q)  by  _o/(q)  i)/  =  ^—(pij/1)'  +  q2'l>*  Then  with  a  carefully  defined 
nonlinear  map  F  (see  [8])  we  find  that  (5 . 10) - (5 . 12)  is  equivalent 
in  some  sense  (one  which  is  sufficient  for  our  purposes)  to  the 
equation  in  Z 

z(t)  =  afXqJz  +  F(q,t,z(t)),  t  >  0 

(5.13) 

z  (0)  =  q3^> . 

Sturm-Liouville  operator  theory  can  be  employed  to  obtain  a  complete 

CO 

orthonormal  set  of  eigenfunctions  { Y . }  for  q)  with 

3  j=1 

q  =  (1,0,..., 0)  and  these  are  used  to  define  modal  approximation 

N  N 

subspaces  Z  =  span  , . . . .  One  then  defines  P  and 
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j^N(q)  in  a  manner  similar  to  that  for  the  FDE  approximations 

discussed  above;  i.e.,  jaf^fq)  =  PNja^(q)PN  where  PN  is  the 

N  N  N 

orthogonal  projection  Pz=  £  (z,  .  )  V .  of  Z  onto  Z  .  An 

j=l  3  3 

approximating  ordinary  differential  equation  ((5.5)  with  G 
replaced  by  F (q, t , zN (t ) )  )  is  then  used  to  formulate  and  solve 
corresponding  approximate  identification  problems  in  a  (by-now) 
obvious  manner. 

We  have  tested  the  methods  proposed  here  for  both  parabolic 
and  hyperbolic  examples  and  present  a  sample  of  our  preliminary 
numerical  findings  for  parabolic  equations.  "Data"  for  the 
examples  below  were  generated  by  employing  an  independent  method 
(Crank-Nicolson)  to  solve  numerically  the  equations  with  the 
parameters  q  set  equal  to  their  "true"  values  q. 

Example  5.4 

We  consider  the  equation 


Ut  ■  -1  ux*  +  ^2U 


with  boundary  conditions  (5.12)  and  initial  condition  (5.11)  where 

q3  =  1  and  <j>  is  taken  as  the  piecewise  linear  continuous  ("roof") 

function  that  is  linear  on  f0,j]  and  [^l]  and  satisfies 

<f  (0)  =  <p(l)  =  0,  <#>  (i)  =1.  A  true  value  of  q2  =  .8  was  chosen. 

4  0 

For  N  =  4,  a  start-up  value  of  q2'  =  .25  was  selected  and  the 

IMSL  package  for  the  approximating  problem  produced  a  converged 
—4 

value  q2  =  .8001. 


.8001. 
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We  considered  then  the  equation 


ut  “  qluxx  +  -2u 

with  the  same  initial  and  boundary  conditions.  A  true  value 

—  4  0 

=  .1  was  taken.  For  N  =  4  and  start-up  value  '  =  .25, 

—4 

the  estimate  q^  =  .0999  was  obtained. 


Example  5.5. 

The  nonlinear  equation 


Ut  •luxx  +  q4  1+u 


with  boundary  conditions  (5.12)  and  initial  condition  (5.11) 
where  <j>  is  as  in  Example  5.4  (except  <j>  ( ^)  =  4)  was  investigated 

with  "true"  parameter  values  (q^,q^)  =  (5. 0,2.0).  With  start-up 

MONO  , 

values  (q^'  ,q4'  )  =  (1.0, 0.0)  the  following  estimates  were 

obtained. 


N 

-N 

q3 

-N 

q4 

4 

5.2274 

1.9254 

8 

5.1374 

1.9741 

16 

5.0668 

1.9845 

True 

Values 

5.0 

2.0 

The  theory  for  modal  approximations  in  the  case  of  hyperbolic 


equations  is  only  slightly  more  involved  than  that  sketched  above 
for  parabolic  systems.  For  example,  an  equation  of  the  form 

utt  =  qiuxx  +  q2ut  +  q3u  +  f(g4»t*x»u) 

can  be  written  as  a  first  order  vector  system  employing  appropriate 
Sobolev  spaces  (e.g.,  Z  =  Hq  x  i,  ) 


d 

dt 


* 

u 

kut 

qlA  +  q3  q2 


u 

+ 

0 

u. 

f 

tj 

j 

and  a  convergence  analysis  carried  out  with  the  aid  of  general 
spectral  theorems  and  semigroup  approximation  results.  For  details 
as  well  as  numerical  examples,  see  [8]  ,  [9] . 


Acknowledgement ;  Two  of  my  students,  Patti  Daniel  and  Jim  Crowley, 
have  contributed  significantly  to  the  numerical  studies  outlined 
in  section  5  and  I  would  like  to  express  my  deep  appreciation  for 
their  continuing  efforts  on  these  and  related  investigations. 


REFERENCES 


[1]  H.T. Banks,  On  identification  problems  arising  in  biomedical 
modeling,  Sdminaires  IRIA:  analyse  et  controle  de  systemes, 

1972,  33-37. 

[2]  H.T. Banks,  Computational  difficulties  in  the  identification 
and  optimization  of  control  systems,  Proc,  Workshop  on 
Applied  Control  Theory  to  Renewable  Resource  Management  and 
Ecology ,  Univ.  Canterbury,  Christchurch,  New  Zealand, 

Jan.  7-11,  1980,  Springer-Verlag ,  to  appear. 

[3]  H.T. Banks,  Identification  of  nonlinear  delay  systems  using 
spline  methods,  Proc.  International  Conference  on  Nonlinear 
Phenomena  in  the  Math  Sciences,  Univ.  Texas,  Arlington, 

June  16-20,  1980,  Academic  Press,  to  appear. 

[4]  H.T. Banks,  J. A. Burns,  and  E.M. Cliff,  A  comparison  of  numerical 
methods  for  identification  and  optimization  problems  involving 
control  systems  with  delays,  LCDS  Tech.  Rep.  79-7,  November 
1979,  Brown  University,  Prov.  R.I. 

[5]  H.T. Banks  and  P. Daniel,  Estimation  of  delays  and  other 
parameters  in  nonlinear  functional  differential  equations, 
to  appear. 

[6]  H.T. Banks  and  G.M.  Groome,  Convergence  theorems  for  parameter 
estimation  by  quasilinearization,  J.  Math.  Anal.  Appl .  42^ 
(1973),  91-109. 

[7]  H.T. Banks  and  F.Kappel,  Spline  approximations  for  functional 
differential  equations,  J.  Differential  Eqns.  34 (1979) , 

496-522. 

[8]  H.T. Banks  and  K.  Kunisch,  Parameter  estimation  techniques 
for  nonlinear  distributed  parameter  systems,  Proc.  Interna¬ 
tional  Conference  on  Nonlinear  Phenomena  in  the  Math.  Sciences, 
Univ.  Texas,  Arlington,  June  16-20,  1980,  Academic  Press, 

to  appear. 

19]  H.T. Banks  and  K.  Kunisch,  An  approximation  theory  for  nonlinear 
partial  differential  equations  with  applications  to  identifica¬ 
tion  and  control,  to  appear. 

[10]  H.T.  Banks  and  P.J.  Palatt,  Mathematical  Modeling  in  the 
Biological  Sciences,  LCDS  Lecture  Notes  75-1,  June  1975, 

Brown  University,  Providence,  R.  I. 


43. 


[11]  J. Brocas  and  Y.Cherruault ,  Association  des  systhmes  trach6ens 
et  circulatoires  dans  le  transport  des  gaz  respiratoires 

chez  l'insecte  adrien,  International  J.  Bio-Med.  Computing 
4  (1973) ,  173-204. 

[12]  J. Brocas  and  Y.  Cherruault,  Etude  sur  modfele  mathdmatique 
des  echanges  gazeux  alveolo-capillaires  -  Role  joue  dans  ces 
echanges  par  le  presence  d'un  gaz  inerte,  Math.  Biosci.  £8 
(1973),  313-333. 

[13]  J.Comroe,  Physiology  of  Respiration,  Year  Book  Med.  Publ., 
Chicago,  1965. 

[14]  H. Cramer,  Mathematical  Methods  of  Statistics,  Princeton 
Univ.  Press,  Princeton,  N.J.  1946. 

[15]  R.Deutsch,  Estimation  Theory,  Prentice-Hall,  Englewood 
Cliffs,  N.J.  1965. 

[16]  R. Fletcher  and  C.M. Reeves,  Function  minimization  by  conjugate 
gradients.  Computer  J.  7(1964),  149-154. 

[17]  F.Grodins,  J. Buell,  and  A. Bart,  Mathematical  analysis  and 
digital  simulation  of  the  respiratory  control  system, 

J.  Applied  Physiol.  22(1967),  260-276. 

[18]  G.M.Groome,  Identification  of  Dynamical  Systems,  Ph.D.  Thesis, 
Brown  University,  June,  1972. 

[19]  A. C. Jackson  and  H.T.  Milhorn,  Digital  computer  simulation 
of  respiratory  mechanics,  Comp.  Biomed.  Res.  £(1973),  27-56. 

[20]  F.Kappel,  Spline  approximation  for  autonomous  nonlinear 
functional  differential  equations,  preprint,  June  1980,  to 
appear . 

[21]  J.Kowalik  and  M.R. Osborne,  Methods  for  Unconstrained 
Optimization  Problems,  American  Elsevier,  New  York,  1968. 

[22]  B.W.Lindgren,  Statistical  Theory,  Macmillan,  New  York,  1962. 

[23]  D.G.Luenberger,  Optimization  by  Vector  Space  Methods, 

John  Wiley  and  Sons,  New  York,  196$ 

[24]  M.C. Mackey  and  L.  Glass,  Oscillation  and  chaos  in  physiological 
control  systems,  Science  197 (July,  1977),  287-289. 


[25]  H.T.  Milhorn,  The  Application  of  Control  Theory  to 

Physiological  Systems,  Saunders,  Philadelphia,  1966,  Chapter 
15. 


[26]  H.T.Milhorn,  R. Benton,  R.Ross,  and  A. C. Guyton,  A  mathematical 
model  of  the  human  respiratory  control  system,  Biophysical 

J.  5  (1965)  ,  27-46. 

[27]  H.T.Milhorn  and  A. C. Guyton,  An  analog  computer  analysis  of 
Cheyne-Stokes  breathing,  J.  Appl.  Physiol.  £0(1965),  328-333. 

[28]  A. M. Mood  and  F. A. Graybill ,  Introduction  to  the  Theory  of 
Statistics ,  McGraw-Hill,  New  York,  1963. 

[29]  J.M. Ortega  and  W.C . Rheinboldt ,  Iterative  Solution  of  Non¬ 
linear  Equations  in  Several  Variables,  Academic  Press,  New  York,  1970. 

[30]  P.J.Palatt,  H.Sackin  and  R. I. Tanner,  A  hydrodynamic  model  of 
a  permeable  tubule,  J.  Theor.  Biol.  £4(1974),  287. 

[31]  P.J.Palatt  and  G.M.Saidel,  A  note  on  a  simple  counter-current 
exchange  system,  T.-I.-T.  J.  Life  Sci.  £(1971),  123. 

[32]  P.J.Palatt,  G.M.Saidel  and  J.Macklin,  Transport  processes 
in  the  renal  cortex,  J.  Theor.  Biol.  ££(1970) ,  251-274. 

[33]  E.  Selkurt,  Physiology ,  Little,  Brown  and  Co.,  Boston,  1966. 

[34]  D.M.Wiberg,  J.W.Bellville,  0. Brovko,  R. Maine,  and  T.C.Tai, 
Modeling  and  parameter  identification  of  the  human  respiratory 
system,  IEEE  Trans.  Auto.  Control,  AC-24  (1979),  716-720. 


■4 


